Spatial Planning and Connectivity Corpus - Technical Background Report

IPBES Spatial Planning and Connectivity Assessment

Authors

Rainer M. Krug

Gabriella Bishop

Sebastian Villasante

Doi
Abstract

To Be added

DOI GitHub License: CC BY 4.0

Disclaimer

Contributors

Assessment Experts

  • xxx, yyy ORCID

Data and Knowledge tsu

  • Niamir, Aidin ORCID

Working Title

IPBES_SPC_Corpus

Code repo

Github repository

Build No: 55

Introduction

The literature search for the Spatial Planning and Connectivity assessment corpus was conducted using search terms provided by the experts and refined in co-operation with the IPBES task force for data and knowledge management. The search was conducted using OpenAlex, scripted from R to use the OpenAlex API. Search terms for the following searches were defined:

  • Spatial Planning and Connectivity,
  • Nature / Environment
  • additional search terms for specific corpora

To assess the quality of the corpus, sets of key papers were selected by the experts to verify if these are in the corpus.

The following terminology is used in this document:

  • Corpus: A body of works as based on a search on OpenAlex
  • Spatial Planning and Connectivity Assessment Corpus: Short: SPC corpus; The corpus resulting from the search terms TO BE ADDED
  • work: terminology used for a single document in their dataset. Each work has a unique OpenAlex id, but not necesarily a DOI.

The following searches are conducted on Title and Abstract only to account for fluctuating availability of full text searches and make the search more focussed..

Schematic Overview

Overview

flowchart TD
    Start([Start literature search]) --> SPC["spc_corpus.yaml<br/>Assemble base SPC corpus"]
    click SPC "./input/search_terms/spc_corpus.yaml" "Open spc_corpus.yaml"
    SPC --> SPC_list["spc keyword set<br/>(planning & connectivity terms)"]
    SPC --> NATURE_list["nature keyword set<br/>(environmental context terms)"]
    SPC_list --> BaseQuery["Level 1 query<br/>spc terms AND nature dictionary"]
    NATURE_list --> BaseQuery

    BaseQuery --> ChapterSelect{Apply chapter / theme refinements}

    ChapterSelect --> CH1["Chapter 1<br/>chapter_1.v1.yaml<br/>Governance & planning principles"]
    click CH1 "./input/search_terms/chapter_1.v1.yaml" "Open chapter_1.v1.yaml"
    ChapterSelect --> CH2["Chapter 2<br/>chapter_2.yaml + chapter_2_add.yaml + chapter_2_sdg.yaml<br/>GBF targets, nexus themes, SDGs"]
    click CH2 "./input/search_terms/chapter_2.yaml" "Open chapter_2.yaml"
    ChapterSelect --> CH3["Chapter 3<br/>chapter_3.yaml<br/>Restoration & conservation planning"]
    click CH3 "./input/search_terms/chapter_3.yaml" "Open chapter_3.yaml"
    ChapterSelect --> CH4["Chapter 4<br/>chapter_4.yaml<br/>Connectivity evidence & metrics"]
    click CH4 "./input/search_terms/chapter_4.yaml" "Open chapter_4.yaml"
    ChapterSelect --> CH5["Chapter 5<br/>chapter_5_* files<br/>Foresight & futures (sections, themes, cross-cutting)"]
    click CH5 "./input/search_terms/Chapter_5_1_2.yaml" "Open Chapter 5 search terms"
    ChapterSelect --> CH6["Chapter 6<br/>chapter_6.yaml (+ chapter_6_r2.yaml optional)<br/>Enabling environment"]
    click CH6 "./input/search_terms/chapter_6.yaml" "Open chapter_6.yaml"

Chapter 1

flowchart LR
    Start([SPC Corpus]) --> Ch1["chapter_1.v1.yaml<br/>Level 2 refinements"]
    click Ch1 "./input/search_terms/chapter_1.v1.yaml" "Open chapter_1.v1.yaml"

    subgraph Chapter1Sets["Chapter 1 thematic searches"]
        direction TB
        C1_1["Set 1:<br/>land/spatial planning<br/>+ biodiversity goals<br/>+ societal needs/values"]
        C1_2["Set 2:<br/>adaptive/scenario planning<br/>+ monitoring/feedback"]
        C1_3["Set 3:<br/>evidence & precaution<br/>+ ILK knowledge base"]
        C1_4["Set 4:<br/>multilevel/transparent governance<br/>+ customary coherence"]
        C1_5["Set 5:<br/>participatory planning<br/>+ co-design / engagement"]
        C1_6["Set 6:<br/>equity / rights / tenure<br/>+ justice outcomes"]
        C1_7["Set 7:<br/>connectivity (land-sea/cross-scale)<br/>+ nexus & climate links"]
    end

    Ch1 --> C1_1
    Ch1 --> C1_2
    Ch1 --> C1_3
    Ch1 --> C1_4
    Ch1 --> C1_5
    Ch1 --> C1_6
    Ch1 --> C1_7

Chapter 2

flowchart LR
    Start([SPC Corpus]) --> Ch2L2["chapter_2.yaml<br/>Level 2 GBF bundles"]
    click Ch2L2 "../search_terms/chapter_2.yaml" "Open chapter_2.yaml"

    subgraph L1["GBF contexts & targets"]
        direction TB
        GBF_Urban["GBF-1 Urban"]
        GBF_Rural["GBF-1 Rural"]
        GBF_Fresh["GBF-1 Freshwater"]
        GBF_Marine["GBF-1 Marine"]
        GBF_Restore["GBF-2 Ecosystem restoration"]
        T3["Target 3"]
        T4["Target 4"]
        T5["Target 5"]
        T6["Target 6"]
        T7["Target 7"]
        T8["Target 8"]
        T9["Target 9"]
        T10["Target 10"]
        T11["Target 11"]
        T12["Target 12"]
        T13["Target 13"]
        T14["Target 14"]
        T15["Target 15"]
        T16["Target 16"]
        T17["Target 17"]
        T18["Target 18"]
        T19["Target 19"]
        T20["Target 20"]
        T21["Target 21"]
        T22["Target 22"]
        T23["Target 23"]
        REL["Spatial Planning Related"]
    end
    Ch2L2 --> GBF_Urban
    Ch2L2 --> GBF_Rural
    Ch2L2 --> GBF_Fresh
    Ch2L2 --> GBF_Marine
    Ch2L2 --> GBF_Restore
    Ch2L2 --> T3
    Ch2L2 --> T4
    Ch2L2 --> T5
    Ch2L2 --> T6
    Ch2L2 --> T7
    Ch2L2 --> T8
    Ch2L2 --> T9
    Ch2L2 --> T10
    Ch2L2 --> T11
    Ch2L2 --> T12
    Ch2L2 --> T13
    Ch2L2 --> T14
    Ch2L2 --> T15
    Ch2L2 --> T16
    Ch2L2 --> T17
    Ch2L2 --> T18
    Ch2L2 --> T19
    Ch2L2 --> T20
    Ch2L2 --> T21
    Ch2L2 --> T22
    Ch2L2 --> T23
    Ch2L2 --> REL

    L1 --> Ch2L3["chapter_2_add.yaml<br/>Level 3 nexus themes"]
    click Ch2L3 "../search_terms/chapter_2_add.yaml" "Open chapter_2_add.yaml"
    subgraph NexusSets["Nexus add-ons"]
        direction TB
        Nexus_Water["Water"]
        Nexus_Food["Food"]
        Nexus_Health["Health"]
        Nexus_Climate["Climate"]
    end
    Ch2L3 --> Nexus_Water
    Ch2L3 --> Nexus_Food
    Ch2L3 --> Nexus_Health
    Ch2L3 --> Nexus_Climate

    L1 --> Ch2L4["chapter_2_sdg.yaml<br/>Level 4 SDG filters"]
    click Ch2L4 "../search_terms/chapter_2_sdg.yaml" "Open chapter_2_sdg.yaml"
    subgraph SDGSets["SDG goal filters"]
        direction TB
        SDG1["SDG 1"]
        SDG2["SDG 2"]
        SDG3["SDG 3"]
        SDG4["SDG 4"]
        SDG5["SDG 5"]
        SDG6["SDG 6"]
        SDG7["SDG 7"]
        SDG8["SDG 8"]
        SDG9["SDG 9"]
        SDG10["SDG 10"]
        SDG11["SDG 11"]
        SDG12["SDG 12"]
        SDG13["SDG 13"]
        SDG14["SDG 14"]
        SDG15["SDG 15"]
        SDG16["SDG 16"]
        SDG17["SDG 17"]
    end
    Ch2L4 --> SDG1
    Ch2L4 --> SDG2
    Ch2L4 --> SDG3
    Ch2L4 --> SDG4
    Ch2L4 --> SDG5
    Ch2L4 --> SDG6
    Ch2L4 --> SDG7
    Ch2L4 --> SDG8
    Ch2L4 --> SDG9
    Ch2L4 --> SDG10
    Ch2L4 --> SDG11
    Ch2L4 --> SDG12
    Ch2L4 --> SDG13
    Ch2L4 --> SDG14
    Ch2L4 --> SDG15
    Ch2L4 --> SDG16
    Ch2L4 --> SDG17

Chapter 3

flowchart LR
    Start([SPC Corpus]) --> Ch3["chapter_3.yaml<br/>Level 2 refinements"]
    click Ch3 "./input/search_terms/chapter_3.yaml" "Open chapter_3.yaml"

    subgraph Chapter3Sets["Chapter 3 searches"]
        direction TB
        C3_1["Set 1:<br/>Protected/OECM + NBSAP + cases<br/>+ spatial prioritization + regional scales"]
        C3_2["Set 2:<br/>Restoration types + inclusivity & ILK"]
        C3_3["Set 3:<br/>Restoration planning + connectivity + resilience"]
        C3_4["Set 4:<br/>Systematic conservation planning / gap analysis"]
        C3_5["Set 5:<br/>Protected area & connectivity planning"]
        C3_6["Set 6:<br/>Landscape/species/corridor networks"]
        C3_7["Set 7:<br/>Conservation planning + ecosystem services"]
        C3_8["Set 8:<br/>Participatory conservation mapping"]
        C3_9["Set 9:<br/>Conservation effectiveness + spatial planning"]
        C3_10["Set 10:<br/>Adaptive management under global change drivers"]
    end
    Ch3 --> C3_1
    Ch3 --> C3_2
    Ch3 --> C3_3
    Ch3 --> C3_4
    Ch3 --> C3_5
    Ch3 --> C3_6
    Ch3 --> C3_7
    Ch3 --> C3_8
    Ch3 --> C3_9
    Ch3 --> C3_10

Chapter 4

flowchart LR
    Start([SPC Corpus]) --> Ch4["chapter_4.yaml<br/>Level 2 refinements"]
    click Ch4 "./input/search_terms/chapter_4.yaml" "Open chapter_4.yaml"

    subgraph Chapter4Sets["Chapter 4 searches"]
        direction TB
        C4_1["Set 1:<br/>Connectivity review articles"]
        C4_2["Set 2:<br/>Connectivity benefits vs risks<br/>+ ecosystem services"]
        C4_3["Set 3:<br/>Structural vs functional connectivity"]
        C4_4["Set 4:<br/>Connectivity modelling toolkits"]
        C4_5["Set 5:<br/>Connectivity indicators & metrics"]
        C4_6["Set 6:<br/>Policy & governance integration"]
        C4_7["Set 7:<br/>Multilevel / transboundary governance<br/>+ IPLC inclusion + land tenure"]
        C4_8["Set 8:<br/>Movement ecology<br/>(dispersal / migration / permeability)"]
    end
    Ch4 --> C4_1
    Ch4 --> C4_2
    Ch4 --> C4_3
    Ch4 --> C4_4
    Ch4 --> C4_5
    Ch4 --> C4_6
    Ch4 --> C4_7
    Ch4 --> C4_8

Chapter 5

flowchart LR
    Start([SPC Corpus]) --> Ch5L2["Chapter_5_1_2.yaml<br/>Level 2 foresight framing"]
    click Ch5L2 "./input/search_terms/Chapter_5_1_2.yaml" "Open Chapter_5_1_2.yaml"

    subgraph L2Sets["Section 5.1–5.2 searches"]
        direction TB
        C5_1["Set 1:<br/>Future + project*/predict*/scenario"]
        C5_2["Set 2:<br/>Future + pathway*/narrative*/vision*"]
        C5_3["Set 3:<br/>Future-proof*/anticipat*/scenario planning"]
        C5_4["Set 4:<br/>Foresight*/backcasting/simulat*/trend*"]
        C5_5["Set 5:<br/>Model* AND scenario"]
    end
    Ch5L2 --> C5_1
    Ch5L2 --> C5_2
    Ch5L2 --> C5_3
    Ch5L2 --> C5_4
    Ch5L2 --> C5_5

    Start --> Ch5L3["chapter_5_3.yaml<br/>Level 3 – Drivers of change"]
    click Ch5L3 "./input/search_terms/chapter_5_3.yaml" "Open chapter_5_3.yaml"
    subgraph L3Sets["Section 5.3 searches"]
        direction TB
        C5_3a["Set 1:<br/>Drivers of change"]
        C5_3b["Set 2:<br/>Driver modelling approaches"]
        C5_3c["Set 3:<br/>Driver gaps"]
    end
    Ch5L3 --> C5_3a
    Ch5L3 --> C5_3b
    Ch5L3 --> C5_3c

    Start --> Ch5L4["chapter_5_4.yaml<br/>Level 4 – Synergies & trade-offs"]
    click Ch5L4 "./input/search_terms/chapter_5_4.yaml" "Open chapter_5_4.yaml"
    subgraph L4Sets["Section 5.4 searches"]
        direction TB
        C5_4a["Set 1:<br/>Interactions (synergy/trade-off/nexus)"]
        C5_4b["Set 2:<br/>Response options (integrated planning / NbS)"]
        C5_4c["Set 3:<br/>Cross-scale synergy & trade-off terms"]
    end
    Ch5L4 --> C5_4a
    Ch5L4 --> C5_4b
    Ch5L4 --> C5_4c

    Start --> Ch5L5["chapter_5_5.yaml<br/>Level 5 – Uncertainty & risk"]
    click Ch5L5 "./input/search_terms/chapter_5_5.yaml" "Open chapter_5_5.yaml"
    subgraph L5Sets["Section 5.5 searches"]
        direction TB
        C5_5a["Set 1:<br/>Adaptive / transformative management"]
        C5_5b["Set 2:<br/>Uncertainty quantification"]
        C5_5c["Set 3:<br/>Tipping points & thresholds"]
        C5_5d["Set 4:<br/>Cascading risks & precaution"]
    end
    Ch5L5 --> C5_5a
    Ch5L5 --> C5_5b
    Ch5L5 --> C5_5c
    Ch5L5 --> C5_5d

    Start --> Ch5L6["chapter_5_6.yaml<br/>Level 6 – Knowledge to action"]
    click Ch5L6 "./input/search_terms/chapter_5_6.yaml" "Open chapter_5_6.yaml"
    subgraph L6Sets["Section 5.6 searches"]
        direction TB
        C5_6a["Set 1:<br/>Science-policy-practice pathways"]
        C5_6b["Set 2:<br/>ILK integration & community planning"]
        C5_6c["Set 3:<br/>Enabling factors & coordination"]
        C5_6d["Set 4:<br/>Shocks, surprises, uncertainties"]
    end
    Ch5L6 --> C5_6a
    Ch5L6 --> C5_6b
    Ch5L6 --> C5_6c
    Ch5L6 --> C5_6d

    Start --> Ch5CC["chapter_5_cc.yaml<br/>Cross-cutting themes"]
    click Ch5CC "./input/search_terms/chapter_5_cc.yaml" "Open chapter_5_cc.yaml"
    subgraph CCSets["Cross-cutting searches"]
        direction TB
        CC1["Set 1:<br/>Scales & telecoupling"]
        CC2["Set 2:<br/>Co-production & inclusion"]
    end
    Ch5CC --> CC1
    Ch5CC --> CC2

Chapter 6

flowchart LR
    Start([SPC Corpus]) --> Ch6["chapter_6.yaml<br/>Level 2 refinements"]
    click Ch6 "./input/search_terms/chapter_6.yaml" "Open chapter_6.yaml"

    subgraph Chapter6Sets["Chapter 6 searches"]
        direction TB
        C6_1["Set 1:<br/>Institutional & governance structures"]
        C6_2["Set 2:<br/>Political & strategic leadership"]
        C6_3["Set 3:<br/>Socio-cultural & stakeholder engagement"]
        C6_4["Set 4:<br/>Collaboration, trust & networks"]
        C6_5["Set 5:<br/>Financial & economic mechanisms"]
        C6_6["Set 6:<br/>Legal & policy frameworks"]
        C6_7["Set 7:<br/>Human & institutional capacity"]
        C6_8["Set 8:<br/>Data, knowledge & decision support"]
        C6_9["Set 9:<br/>Ecological & spatial planning tools"]
        C6_10["Set 10:<br/>Cross-cutting process enablers"]
    end
    Ch6 --> C6_1
    Ch6 --> C6_2
    Ch6 --> C6_3
    Ch6 --> C6_4
    Ch6 --> C6_5
    Ch6 --> C6_6
    Ch6 --> C6_7
    Ch6 --> C6_8
    Ch6 --> C6_9
    Ch6 --> C6_10

Ch6R2["chapter_6_r2.yaml<br/>Optional Level 3 filter"]
Chapter6Sets --> Ch6R2
click Ch6R2 "./input/search_terms/chapter_6_r2.yaml" "Open chapter_6_r2.yaml"
subgraph Chapter6Case["Chapter 6 searches"]
    R2["Case-study keywords<br/>(case stud*, example*, initiative*, etc.)"]
end
Ch6R2 --> R2

Type Selsection

OpenAlex contains more then 270 million works of different types. The following table shows and explains the available types and highlights which are selected to be included in the SPC Corpus.

Show the code
params$types |>
  knitr::kable(
    caption = "OpenAlex Work Types and Inclusion in the SPC Corpus",
    booktabs = TRUE,
    align = c("l", "l", "l", "c")
  ) # |>
OpenAlex Work Types and Inclusion in the SPC Corpus
Type Description Included
article Scholarly journal articles and related periodical works TRUE
book Monographs and other long-form published books TRUE
book-chapter Chapters published within edited books or proceedings TRUE
dissertation Doctoral or master level theses and dissertations TRUE
preprint Pre-publication manuscripts shared prior to peer review TRUE
report Technical, institutional, or policy reports TRUE
review Narrative or systematic review articles TRUE
dataset Published datasets and structured data releases FALSE
editorial Editorials and editor introductions FALSE
erratum Published corrections to previously released works FALSE
grant Summaries or descriptions of grant-funded projects FALSE
letter Correspondence, commentaries, and short letters FALSE
libguides Library research guides and curated bibliographies FALSE
other Works that do not align with a more specific OpenAlex type FALSE
paratext Prefaces, introductions, indexes, and other paratextual items FALSE
peer-review Formal peer review reports and evaluations FALSE
reference-entry Encyclopaedia or dictionary reference entries FALSE
retraction Notices retracting previously published works FALSE
standard Standards, protocols, and technical specifications FALSE
supplementary-materials Supplementary files accompanying primary publications FALSE
Show the code
# kableExtra::kable_styling(
#     full_width = FALSE,
#     position = "left"
# )

Methods

Assess of Individual Terms in spc and nature search terms

This assessment is done on the whole of the OpenAlex corpus and only filtered for types and not for the date range.

Show the code
fn <- file.path(params$output_dir, "searchterm_assessment_spcc.rds")

if (!file.exists(fn)) {
  result <- list(
    spc = assess_search_term_both(
      st = params$search_terms$spc,
      and_term = st(params$search_terms$nature),
      types = params$types_filter,
      verbose = FALSE
    ),
    nature = assess_search_term_both(
      st = params$search_terms$nature,
      and_term = st(params$search_terms$spc),
      types = params$types_filter,
      verbose = FALSE
    )
  ) |>
    saveRDS(fn)
}

Get Key Paper

Here we get key papers in a parquet database which is partitioned by:

  • found_in: the search term or openalex which is used as the filter, i.e. the key paper occurs in corpus which would result from the search term
  • id_used: the id used to testing,m either the OpenAlex id (id) or the doi (doi)
  • page: only for processing reasons

No filtering, neither by type nor by publication year is done.

The other columns are as returned by the OpenAlex API.

Show the code
#|

fn <- file.path(params$keyworks, "parquet")
if (!dir.exists(fn)) {
  # oa
  st <- list(
    spc = params$search_terms$spc |>
      paste0(collapse = " "),
    nature = params$search_terms$nature |>
      paste0(collapse = " ")
  )
  st$spcc <- paste0("(", st$spc, ") AND (", st$nature, ")")

  dois <- params$key_papers$goldstandard$DOI[
    params$key_papers$goldstandard$DOI != ""
  ]
  ids <- params$key_papers$goldstandard$openalex_id[
    params$key_papers$goldstandard$openalex_id != ""
  ]

  ### KP in OpenAlex
  openalexPro::pro_query(
    doi = dois,
    chunk_limit = 50
  ) |>
    openalexPro::pro_request(
      output = file.path(fn, "..", "json_doi")
    ) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "..", "jsonl_doi"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=openalex", "id_used=doi"),
      delete_input = TRUE
    )

  openalexPro::pro_query(
    id = ids,
    multiple_id = TRUE,
    chunk_limit = 50
  ) |>
    openalexPro::pro_request(
      output = file.path(fn, "..", "json_oa_id")
    ) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "..", "jsonl_oa_id"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=openalex", "id_used=oa_id"),
      delete_input = TRUE
    )

  ### KP in spc
  openalexPro::pro_query(
    title_and_abstract.search = st$spc,
    doi = dois,
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(
      output = file.path(fn, "..", "json_doi")
    ) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "..", "jsonl_doi"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=spc", "id_used=doi"),
      delete_input = TRUE
    )

  openalexPro::pro_query(
    title_and_abstract.search = st$spc,
    id = ids,
    multiple_id = TRUE,
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(output = file.path(fn, "json_oa_id")) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "jsonl_oa_id"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=spc", "id_used=oa_id"),
      delete_input = TRUE
    )

  ### KP in nature
  openalexPro::pro_query(
    title_and_abstract.search = st$nature,
    doi = dois,
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(output = file.path(fn, "json_doi")) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "jsonl_doi"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=nature", "id_used=doi"),
      delete_input = TRUE
    )

  openalexPro::pro_query(
    title_and_abstract.search = st$nature,
    id = ids,
    multiple_id = TRUE,
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(output = file.path(fn, "json_oa_id")) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "jsonl_oa_id"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=nature", "id_used=oa_id"),
      delete_input = TRUE
    )

  ### KP in spcc
  openalexPro::pro_query(
    title_and_abstract.search = st$spcc,
    doi = params$key_papers$goldstandard$DOI[
      params$key_papers$goldstandard$DOI != ""
    ],
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(output = file.path(fn, "json_doi")) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "jsonl_doi"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=spcc", "id_used=doi"),
      delete_input = TRUE
    )

  openalexPro::pro_query(
    title_and_abstract.search = st$spcc,
    id = params$key_papers$goldstandard$openalex_id[
      params$key_papers$goldstandard$openalex_id != ""
    ],
    multiple_id = TRUE,
    chunk_limit = 25
  ) |>
    openalexPro::pro_request(output = file.path(fn, "json_oa_id")) |>
    openalexPro::pro_request_jsonl(
      output = file.path(fn, "jsonl_oa_id"),
      delete_input = TRUE
    ) |>
    openalexPro::pro_request_jsonl_parquet(
      output = file.path(fn, "found_in=spcc", "id_used=oa_id"),
      delete_input = TRUE
    )
}

Keypaper in Search Terms

The in the previous step retrieved works are analysed here to get a table which shows where the key paper occur.

Show the code
fn <- file.path(params$keyworks, "kp_found_in.rds")
if (!file.exists(fn)) {
  arrow::open_dataset(file.path(params$keyworks, "parquet")) |>
    dplyr::select(
      id,
      doi,
      type,
      found_in,
      title,
      citation
    ) |>
    dplyr::group_by(id, doi, title, citation, type) |>
    dplyr::summarise(
      in_openalex = base::max(found_in == "openalex", na.rm = TRUE),
      in_spc = base::max(found_in == "spc", na.rm = TRUE),
      in_nature = base::max(found_in == "nature", na.rm = TRUE),
      in_spcc = base::max(found_in == "spcc", na.rm = TRUE),
      .groups = "drop"
    ) |>
    dplyr::collect() |>
    saveRDS(fn)
}

Get Numbers from OpenAlex of the Search Terms

These data is gathered from OpenAlex directly, not downloaded any works. The data is used to assess the quality of the TCA Corpus.

The query contains: - the search term (nature, spc, spcc) - the types selected (article, book, book-chapter, dissertation, preprint, report, review) - the date range (from 1992-01-01 to 2025-12-31)

The following counts are retrieved:

Overall hits

Show the code
#|

fn <- file.path(params$corpus, "st_hits.rds")
if (!file.exists(fn)) {
  st <- list(
    spc = params$search_terms$spc |>
      paste0(collapse = " "),
    nature = params$search_terms$nature |>
      paste0(collapse = " ")
  )
  st$spcc <- paste0("(", st$spc, ") AND (", st$nature, ")")

  queries <- lapply(
    st,
    function(s) {
      openalexPro::pro_query(
        title_and_abstract.search = s,
        type = params$types_filter,
        from_publication_date = params$publication_date$from,
        to_publication_date = params$publication_date$to
      )
    }
  )
  queries$openalex <- openalexPro::pro_query(
    type = params$types_filter,
    from_publication_date = params$publication_date$from,
    to_publication_date = params$publication_date$to,
  )

  pbapply::pblapply(
    queries,
    function(query) {
      query |>
        openalexR::oa_request(
          count_only = TRUE,
          verbose = TRUE
        ) |>
        unlist()
    }
  ) |>
    do.call(what = cbind) |>
    t() |>
    as.data.frame() |>
    dplyr::select(count) |>
    saveRDS(file = fn)
}

Counts per Language

Show the code
#|

fn <- file.path(params$corpus, "st_languages.rds")
if (!file.exists(fn)) {
  st <- list(
    spc = params$search_terms$spc |>
      paste0(collapse = " "),
    nature = params$search_terms$nature |>
      paste0(collapse = " ")
  )
  st$spcc <- paste0("(", st$spc, ") AND (", st$nature, ")")

  queries <- lapply(
    st,
    function(s) {
      openalexPro::pro_query(
        title_and_abstract.search = s,
        type = params$types_filter,
        from_publication_date = params$publication_date$from,
        to_publication_date = params$publication_date$to,
        group_by = "language"
      )
    }
  )
  queries$openalex <- openalexPro::pro_query(
    type = params$types_filter,
    from_publication_date = params$publication_date$from,
    to_publication_date = params$publication_date$to,
    group_by = "language"
  )

  pbapply::pblapply(
    queries,
    function(query) {
      query |>
        openalexR::oa_request(
          verbose = TRUE
        ) |>
        dplyr::bind_rows()
    }
  ) |>
    dplyr::bind_rows(.id = "source") |>
    dplyr::select(source, language = key_display_name, count) |>
    tidyr::pivot_wider(
      names_from = source,
      values_from = count,
      names_prefix = "count_",
      values_fill = 0
    ) |>
    dplyr::select(
      language,
      count_openalex,
      count_spc,
      count_nature,
      count_spcc
    ) |>
    dplyr::arrange(language) |>
    saveRDS(file = fn)
}

Counts per Publication Year

Show the code
fn <- file.path(params$corpus, "st_years.rds")
if (!file.exists(fn)) {
  st <- list(
    spc = params$search_terms$spc |>
      paste0(collapse = " "),
    nature = params$search_terms$nature |>
      paste0(collapse = " ")
  )
  st$spcc <- paste0("(", st$spc, ") AND (", st$nature, ")")

  queries <- lapply(
    st,
    function(s) {
      openalexPro::pro_query(
        title_and_abstract.search = s,
        type = params$types_filter,
        from_publication_date = params$publication_date$from,
        to_publication_date = params$publication_date$to,
        group_by = "publication_year"
      )
    }
  )
  queries$openalex <- openalexPro::pro_query(
    type = params$types_filter,
    from_publication_date = params$publication_date$from,
    to_publication_date = params$publication_date$to,
    group_by = "publication_year"
  )

  result <- pbapply::pblapply(
    queries,
    function(query) {
      query |>
        openalexR::oa_request(
          verbose = TRUE
        ) |>
        dplyr::bind_rows()
    }
  ) |>
    dplyr::bind_rows(.id = "source") |>
    dplyr::select(source, year = key, count) |>
    dplyr::mutate(year = base::as.integer(year)) |>
    tidyr::pivot_wider(
      names_from = source,
      values_from = count,
      names_prefix = "count_",
      values_fill = 0
    ) |>
    dplyr::select(
      year,
      count_openalex,
      count_spc,
      count_nature,
      count_spcc
    ) |>
    dplyr::arrange(dplyr::desc(year)) |>
    saveRDS(file = fn)
}

Results

Assessment of Search Terms

SPC Term

Show the code
readRDS(file.path(params$output, "searchterm_assessment_spcc.rds"))$spc |>
  dplyr::arrange(desc(count)) |>
  dplyr::mutate(
    count = format(count, big.mark = ","),
    count_excl = format(count_excl, big.mark = ","),
  ) |>
  knitr::kable(format = "html", escape = FALSE)
term count count_excl
"connectivity" 460,287 453,529
"urban planning" 69,642 61,742
"protected area" 58,642 52,494
"zoning" 47,430 42,878
"spatial planning" 26,415 20,569
"land use planning" 22,834 0
"land-use planning" 22,834 0
"environmental impact assessment" 18,914 17,518
"ecological restoration" 15,844 13,908
"regional planning" 15,617 13,234
"stepping stones" 15,318 14,615
"conservation planning" 13,857 11,018
"planning tools" 13,670 11,846
"spatial configuration" 11,425 10,302
"green infrastructure" 11,246 9,151
"spatial development" 10,486 8,420
"nature based solutions" 7,852 0
"nature-based solutions" 7,852 0
"landscape planning" 6,330 5,121
"ecological networks" 5,787 4,626
"multi-criteria decision analysis" 5,554 5,263
"ecosystem restoration" 4,783 3,857
"spatial transformation" 4,643 4,275
"scenario planning" 4,505 4,212
"habitat restoration" 4,304 3,526
"landscape management" 4,301 3,599
"land use management" 3,900 0
"land-use management" 3,900 0
"territorial planning" 3,774 2,730
"restoration planning" 3,383 2,599
"participatory planning" 3,301 2,740
"strategic environmental assessment" 2,768 1,975
"land allocation" 2,690 2,355
"spatial composition" 2,578 2,369
"ecosystem-based management" 2,372 0
"ecosystem based management" 2,372 0
"forest management planning" 2,362 2,197
"adaptive planning" 2,316 2,153
"land use model" 2,187 0
"land-use model" 2,187 0
"ecological corridor" 2,060 1,068
"rewilding" 2,033 1,790
"land use scenario" 1,974 0
"land-use scenario" 1,974 0
"sectorial planning" 1,735 1,469
"landscape restoration" 1,701 1,373
"integrated coastal zone management" 1,514 1,300
"land governance" 1,422 1,290
"restoration ecology" 1,416 1,036
"inclusive governance" 1,410 1,293
"functional landscapes" 870 738
"wildlife corridor" 767 496
"reserve design" 637 412
"blue infrastructure" 600 363
"spatial governance" 581 398
"working landscapes" 461 395
"marine governance" 387 301
"habitat corridor" 367 198
"community based planning" 335 0
"community-based planning" 335 0
"OECM" 325 223
"ecosystem service model" 282 227
"spatial priority" 279 171
"landscape governance" 276 208
"anticipatory planning" 228 214
"scenario-based planning" 220 0
"scenario based planning" 220 0
"ecosystem service mapping" 138 102
"land use governance" 132 0
"land-use governance" 132 0
"place-based planning" 66 0
"place based planning" 66 0
"spatial conservation priorities" 66 37
"remote ocean areas" 62 58
"sea use management" 38 0
"sea-use management" 38 0
"spatial forest planning" 35 30
"agricultural management planning" 29 29
"ecosystem service planning" 22 14
"ocean use management" 22 21
"land-sea planning" 20 10
"planning across scales" 9 6
"seascape planning" 9 6
"cross-scale planning" 2 1
"biodiversity-inclusive planning" 1 0
"inland waters planning" 1 1
"maretories" 0 0
"seascape governance" 0 0

Nature Term

Show the code
readRDS(file.path(params$output, "searchterm_assessment_spcc.rds"))$nature |>
  dplyr::arrange(desc(count)) |>
  dplyr::mutate(
    count = format(count, big.mark = ","),
    count_excl = format(count_excl, big.mark = ","),
  ) |>
  knitr::kable(format = "html", escape = FALSE)
term count count_excl
"environment" 5,612,124 4,333,552
"species" 3,609,619 2,541,003
"environmental" 3,442,467 2,318,758
"nature" 3,224,734 2,578,060
"sea" 1,249,802 851,316
"conservation" 826,002 422,236
"Earth" 814,952 582,018
"ecosystem" 754,734 335,560
"ocean" 637,143 356,618
"restore" 587,759 430,791
"habitat" 525,847 180,612
"conserve" 491,662 319,042
"restoration" 459,898 311,273
"coast" 404,467 225,036
"ecology" 397,625 193,370
"biodiversity" 272,814 68,087
"planet" 197,755 117,274
"protected area" 58,642 18,872
"biosphere" 42,097 16,117
"biodiverse" 4,211 570
"nature's contribution to people" 232 33
"nature futures framework" 68 0

Keypaper in Corpus

No filtering, neither by type nor by publication year is done. Therefore, the pure search terms are evaluated. If a paper is included in this table, doex not mean it is included in the final SPC Corpus due to filtering by dates and types!

Show the code
readRDS(file.path(params$keyworks, "kp_found_in.rds")) |>
  dplyr::mutate(
    id_display = sub("^.*/(W[0-9]+)$", "\\1", id),
    id = sprintf("<a href=\"%s\" target=\"_blank\">%s</a>", id, id_display),
    doi_display = sub("^https://doi.org/", "\\1", doi),
    doi = sprintf("<a href=\"%s\" target=\"_blank\">%s</a>", doi, doi_display)
  ) |>
  dplyr::arrange(
    in_spcc,
    in_spc,
    in_nature,
    in_openalex
  ) |>
  dplyr::mutate(
    dplyr::across(
      dplyr::starts_with("in_"),
      ~ dplyr::case_when(
        .x ~ '<b style="color:#008000;">☑</b>', # green bold checkbox
        !.x ~ '<b style="color:#cc0000;">☐</b>' # red bold empty checkbox
      )
    )
  ) |>
  dplyr::select(
    id,
    doi,
    citation,
    in_spcc,
    in_spc,
    in_nature,
    in_openalex
  ) |>
  knitr::kable(format = "html", escape = FALSE)
id doi citation in_spcc in_spc in_nature in_openalex
W4233598570 10.1146/annurev-anthro-102218-011435 David Valentine & Amelia Hassoun (2019)
W2056591496 10.1016/s0016-3287(98)00101-3 Richard A. Slaughter (1998)
W2090312322 10.1016/j.futures.2007.11.010 Joseph Voros (2007)
W1992974326 10.1016/j.futures.2007.11.011 Richard A. Slaughter (2007)
W2884329716 10.1007/978-3-319-94021-2 László Miklós & Anna Špinerová (2018)
W2128217744 10.1177/0306312713508669 David H. Guston (2013)
W2057938345 10.1108/14636680810855991 Sohail Inayatullah (2008)
W1973628253 10.1016/j.enpol.2005.12.006 Will McDowall & Malcolm Eames (2006)
W3170399214 10.1016/j.mex.2021.101401 Daniel Beiderbeck et al. (2021)
W2964311550 10.1016/j.techfore.2019.07.002 Ian Belton et al. (2019)
W4238224782 10.1016/j.futures.2015.08.007 I. Milojević & Sohail Inayatullah (2015)
W2114666631 10.1525/bio.2010.60.3.7 Timothy J. Beechie et al. (2010)
W2062482344 10.1073/pnas.1201040109 Joshua Goldstein et al. (2012)
W4382181311 10.1007/s11625-023-01316-1 América Paz Durán et al. (2023)
W4395447577 10.1126/science.adn3441 Henrique M. Pereira et al. (2024)
W4212886831 10.1111/geb.13459 Karel Mokany et al. (2022)
W2015619301 10.1073/pnas.1000530107 Lian Pin Koh & Jaboury Ghazoul (2010)
W3074014079 10.1007/s10113-020-01685-8 Clara J. Veerkamp et al. (2020)
W4382751812 10.1016/j.biocon.2023.110068 Marcel Kok et al. (2023)
W2972648221 10.1111/1365-2664.13506 Karel Mokany et al. (2019)
W3044450114 10.1016/j.envsoft.2020.104806 Andrew J. Hoskins et al. (2020)
W3145276257 10.1016/j.jenvman.2021.112400 Yuyoung Choi et al. (2021)
W4298615974 10.1007/s10980-022-01534-5 Tom Harwood et al. (2022)
W4414957974 10.1073/pnas.2501695122 Damaris Zurell et al. (2025)
W2255223904 NA Bob Scholes (2010)
W4295308933 10.1073/pnas.2203385119 Natalia Estrada-Carmona et al. (2022)
W2795786882 10.1007/s00267-018-1028-3 Ida N.S. Djenontin & Alison M. Meadow (2018)
W3215595594 10.1093/biosci/biab091 Abigail J. Lynch et al. (2021)
W4412257196 10.5281/zenodo.6522392 Unai Pascual et al. (2022)
W4415164578 10.1098/rsos.250810 Rachael Garrett et al. (2025)
W2093445010 10.1126/science.1258832 Jianguo Liu et al. (2015)
W3085006993 10.1002/pan3.10146 Laura Pereira et al. (2020)
W2754686867 10.1038/s41559-017-0273-9 Isabel M.D. Rosa et al. (2017)
W4380362703 10.1016/j.gloenvcha.2023.102681 Hyejin Kim et al. (2023)
W2086673960 10.1016/j.tree.2009.04.008 William J. Sutherland & Harry J. Woodroof (2009)
W4210386268 10.1016/j.envsci.2022.01.013 Andressa V. Mansur et al. (2022)
W4210765186 10.1038/s41893-021-00844-x Roslyn Henry et al. (2022)
W4310004272 10.1007/s11625-022-01251-7 Lucas Rutting et al. (2022)
W4211108983 10.1007/978-3-030-20024-4 Davide Geneletti et al. (2019)
W2904898541 10.1007/s10668-018-00300-5 Azime Tezer et al. (2018)
W4404646473 10.1007/s00267-024-02086-x Nina Farwig et al. (2024)
W2784805535 10.1126/science.aam9712 Marlee A. Tucker et al. (2018)
W4390696637 10.1038/s41467-023-43832-9 Rachel Neugarten et al. (2024)
W4392367143 10.1111/brv.13066 Steven J. Cooke et al. (2024)
W2161139387 10.1126/science.1196624 Henrique M. Pereira et al. (2010)
W2028797766 10.1016/j.cosust.2013.05.002 Ralf Seppelt et al. (2013)
W2159760863 10.1111/j.1466-8238.2010.00620.x Joachim H. Spangenberg et al. (2011)
W1990241575 10.1007/s10021-004-0074-2 Paul Raskin (2005)
W2015646994 10.1016/j.tree.2014.07.005 Carly N. Cook et al. (2014)
W2902463689 10.1016/j.tree.2018.10.006 Emily Nicholson et al. (2018)
W4411717980 10.1007/s11625-025-01682-y Sana Okayasu et al. (2025)
W4386734502 10.1146/annurev-environ-112321-095011 Steven J. Cork et al. (2023)
W3200717663 10.1007/s10980-021-01329-0 Jianquan Dong et al. (2021)
W4407294089 10.1080/02697459.2025.2459975 Romina Rodela (2025)
W3209482460 10.25607/obp-1666 Alejandro Iglesias-Campos et al. (2021)
W2116090915 10.1016/j.futures.2012.10.003 Muhammad Amer et al. (2012)
W4386609980 10.1038/s43588-023-00503-5 Yu Zheng et al. (2023)
W2099188808 10.1016/j.landurbplan.2006.04.005 Jolande W. Termorshuizen et al. (2006)
W2910481941 10.1080/02513625.2018.1562795 Peter Schmitt & Thorsten Wiechmann (2018)
W4410629764 10.1016/j.rsma.2025.104257 Liisi Lees et al. (2025)
W2614376759 10.1016/j.envsci.2017.05.003 Vanessa M. Adams et al. (2017)
W2971398159 10.1111/rec.13035 George D. Gann et al. (2019)
W1540682446 10.1111/brv.12008 Aija S. Kukkala & Atte Moilanen (2012)
W3161819570 10.1126/science.abc4896 Louise O’Connor et al. (2021)
W2952896657 10.5194/gmd-11-4537-2018 Hyejin Kim et al. (2018)
W3151200781 10.1111/rec.13403 Ben L. Gilby et al. (2021)
W3126018057 10.1111/rec.13346 Jordi Cortina et al. (2021)
W3185562566 10.1038/d41586-021-02041-4 Georgina G. Gurney et al. (2021)
W4406904698 10.1016/j.tree.2024.12.002 Sylvaine Giakoumi et al. (2025)
W2769947232 10.1016/j.cosust.2017.10.004 Jean Paul Metzger et al. (2017)
W2978599153 NA Peter H. Verburg et al. (2019)
W4320016094 10.1007/978-3-031-15773-8_4 Falko Buschke et al. (2023)
W2046569818 10.1007/s10980-014-0052-9 Christine Fürst et al. (2014)
W2766457534 10.1016/j.landusepol.2017.10.017 Chiara Cortinovis & Davide Geneletti (2017)
W2035832288 10.1007/s10980-014-0085-0 Christian Albert et al. (2014)
W2592317409 10.1080/21513732.2017.1296494 Justice Nana Inkoom et al. (2017)
W2612157793 10.1016/j.marpol.2017.06.020 Mara Ntona & Elisa Morgera (2017)
W3134065015 10.1016/j.envsci.2021.02.001 Davide Longato et al. (2021)
W2902345284 10.1007/s10980-018-0745-6 Marcin Spyra et al. (2018)
W3158973852 10.1016/j.landurbplan.2021.104129 Chiara Cortinovis et al. (2021)
W2999493939 10.1016/j.landurbplan.2019.103741 Christian Albert et al. (2020)
W3159060995 10.1016/j.ecoser.2021.101273 Karsten Grunewald et al. (2021)
W2810876831 10.3897/rio.4.e28045 Evelyn Underwood et al. (2018)
W2771003204 10.1080/21513732.2017.1396257 Christine Fürst et al. (2017)
W4376627395 10.1016/j.marpol.2023.105655 Julie Reimer et al. (2023)
W4392293717 10.1016/j.ecolind.2024.111816 Wen Song et al. (2024)
W2804302736 10.1016/j.scitotenv.2018.05.147 Maria da Luz Fernandes et al. (2018)
W4289516997 10.1007/s41207-022-00315-5 Georgia Pozoukidou et al. (2022)
W2604803374 10.1093/biosci/bix012 Charles H. Nilon et al. (2017)
W4406477943 10.1007/s11252-024-01656-5 Israa H. Mahmoud et al. (2025)
W4220936177 10.1016/j.oneear.2022.02.008 Christopher M. Raymond et al. (2022)
W2122104680 10.1111/j.1523-1739.2009.01212.x William J. Sutherland et al. (2009)
W4211243502 10.1038/35012251 Chris Margules & Robert L. Pressey (2000)
W1963746476 10.1016/j.ecocom.2009.10.006 R.S. de Groot et al. (2009)
W4297536966 10.1016/j.tree.2022.09.002 Maria Beger et al. (2022)
W2956763155 10.1088/1748-9326/ab3234 Annika T. H. Keeley et al. (2019)
W2487200415 10.1016/j.marpol.2016.06.023 Elianny Domínguez-Tejo et al. (2016)
W3100416804 10.1111/1365-2664.13796 Virgilio Hermoso et al. (2020)
W4313593374 10.1016/j.mex.2022.101989 Holly Kirk et al. (2023)
W2554309037 10.1111/btp.12386 Agnieszka E. Latawiec et al. (2016)
W2759207970 10.3390/su9091668 Leena Karrasch et al. (2017)
W4393866290 10.3390/su16072965 Qiqi Hu et al. (2024)
W4387055605 10.1038/s44183-023-00022-w Julie Reimer et al. (2023)
W4281717750 10.1126/science.abl8974 Angela Brennan et al. (2022)
W2918948909 10.1016/j.gecco.2019.e00569 Arieanna C. Balbar & Anna Meta×as (2019)
W1966247992 10.1080/21513732.2011.617711 Davide Geneletti (2011)
W2766942546 10.1080/08920753.2017.1373450 Kekuewa Kikiloi et al. (2017)
W2265414043 10.1146/annurev.ecolsys.32.081501.114012 Steward T. A. Pickett et al. (2001)
W2027491594 10.1016/j.ecolind.2015.03.029 Christian Albert et al. (2015)
W4409833817 10.1126/science.adn2225 Jedediah F. Brodie et al. (2025)
W2524916285 10.1002/aqc.2645 Alan M. Friedlander et al. (2016)
W3194313759 10.1007/978-94-024-1681-7 Christina von Haaren et al. (2019)
W3021127236 10.1016/j.marpol.2020.103950 Thomas Robertson et al. (2020)
W4412750811 10.1016/j.marpol.2025.106852 Jean‐Eudes Beuret et al. (2025)
W4413900892 10.1016/j.tree.2025.07.014 Jian Peng et al. (2025)
W2783683111 10.1016/j.biocon.2017.12.020 Santiago Saura et al. (2018)
W1880093763 10.1111/ecog.01507 Rafael A. Magris et al. (2015)
W3132073286 10.1016/j.biocon.2021.109008 Annika T. H. Keeley et al. (2021)
W2965201645 10.1016/j.biocon.2019.07.028 Santiago Saura et al. (2019)
W3085078608 10.1038/s41467-020-18457-x Michelle Ward et al. (2020)
W2791599583 10.1111/conl.12439 Rafael A. Magris et al. (2018)
W4302293680 10.1111/csp2.12823 David M. Theobald et al. (2022)
W1990273831 10.1126/science.1242552 Silke Bauer & Bethany J. Hoye (2014)
W4412704230 10.1073/pnas.2410937122 Robin Naidoo et al. (2025)
W4280541891 10.3389/fevo.2022.830822 Sylvia Wood et al. (2022)
W4413449508 10.1038/s41467-025-63205-8 Jedediah F. Brodie et al. (2025)
W4402144060 10.1002/ece3.70231 Amanda Liczner et al. (2024)
W4414925393 10.1016/j.tree.2025.09.007 Sandra Neubert et al. (2025)
W4410131928 10.3354/meps14888 Susanne E. Tanner et al. (2025)
W4415240687 10.1007/s10980-025-02210-0 Tamsin L. Woodman et al. (2025)
W2023339029 10.1046/j.1523-1739.2003.01491.x Garry Peterson et al. (2003)
W2470861343 10.1016/j.landurbplan.2016.05.003 Adrienne Grêt‐Regamey et al. (2016)
W56780107 10.1007/978-1-4612-0529-6_10 Jack Ahern (1999)
W2073677603 10.1016/j.biocon.2015.02.015 Sebastián Martinuzzi et al. (2015)
W2156111479 10.1111/gcb.12383 Sebastián Martinuzzi et al. (2013)
W4406487214 10.1007/s10980-024-02042-4 Jiangxiao Qiu et al. (2025)

SPC Corpus Measures and Numbers

These data is gathered from OpenAlex directly, not downloaded any works. The data is used to assess the quality of the TCA Corpus.

The query contains: - the search term (nature, spc, spcc) - the types selected (article, book, book-chapter, dissertation, preprint, report, review) - the date range (from 1992-01-01 to 2025-12-31)

Overall counts

Show the code
readRDS(file.path(params$corpus, "st_hits.rds")) |>
  dplyr::mutate(
    count = format(count, big.mark = ",")
  ) |>
  knitr::kable(format = "html", escape = FALSE)
count
spc 817,150
nature 16,831,704
spcc 316,998
openalex 205,618,381

Publication Years

Show the code
readRDS(file.path(params$corpus, "st_years.rds")) |>
  dplyr::mutate(
    dplyr::across(
      dplyr::starts_with("count_"),
      ~ base::format(.x, big.mark = ",")
    )
  ) |>
  knitr::kable(format = "html", escape = FALSE)
year count_openalex count_spc count_nature count_spcc
2025 6,238,678 47,767 740,853 22,345
2024 8,628,798 58,341 1,018,186 25,573
2023 8,698,025 57,058 1,028,635 23,714
2022 7,751,702 51,353 923,218 20,894
2021 8,698,617 53,218 942,833 21,035
2020 9,542,282 50,111 914,885 19,012
2019 9,282,492 44,645 816,583 16,741
2018 9,060,432 40,817 756,636 15,150
2017 9,008,090 37,669 705,257 13,827
2016 9,196,091 35,877 688,657 13,261
2015 8,969,721 34,209 681,556 12,660
2014 8,878,581 33,531 682,758 12,835
2013 8,605,745 31,662 660,254 11,940
2012 8,143,521 28,659 618,795 11,213
2011 7,877,400 26,595 582,558 10,126
2010 7,344,239 23,915 539,602 9,129
2009 6,872,587 20,705 487,532 7,965
2008 6,358,201 18,358 443,181 6,965
2007 5,911,130 16,187 403,977 6,102
2006 5,574,062 14,510 377,466 5,265
2005 5,105,616 13,063 343,339 4,903
2004 4,673,271 11,121 309,091 4,069
2003 4,343,325 10,371 287,303 3,768
2002 4,152,643 9,772 283,751 3,309
2001 3,562,560 7,275 221,308 2,670
2000 3,394,884 6,707 208,037 2,322
1999 2,945,662 5,389 180,150 1,843
1998 2,808,879 4,977 168,916 1,624
1997 2,679,761 4,544 158,089 1,481
1996 2,551,167 4,511 152,580 1,397
1995 2,375,803 3,950 140,460 1,174
1994 2,238,823 3,632 128,767 996
1993 2,123,181 3,387 121,619 866
1992 2,021,948 3,113 112,620 745

This graph only shows the relative nuymber of publications per year to identify different trends.

Show the code
readRDS(file.path(params$corpus, "st_years.rds")) |>
  tidyr::pivot_longer(
    cols = dplyr::starts_with("count_"),
    names_to = "source",
    values_to = "count",
    names_prefix = "count_"
  ) |>
  # scale each source so its total sum is 1
  dplyr::group_by(source) |>
  dplyr::mutate(
    total_count = base::sum(count, na.rm = TRUE),
    count = dplyr::if_else(total_count > 0, count / total_count, 0)
  ) |>
  dplyr::ungroup() |>
  dplyr::select(-total_count) |>
  ggplot2::ggplot(ggplot2::aes(x = year, y = count, color = source)) +
  ggplot2::geom_line(linewidth = 1) +
  ggplot2::geom_point(size = 1.5) +
  ggplot2::scale_y_continuous(
    labels = scales::label_percent(accuracy = 1),
    expand = ggplot2::expansion(mult = c(0.02, 0.06))
  ) +
  ggplot2::labs(
    x = "Year",
    y = "Share of total works (sum-scaled per source)",
    color = "Source",
    title = "Publications per year by source (each source sums to 1)"
  ) +
  ggplot2::theme_minimal(base_size = 14) +
  ggplot2::theme(
    legend.position = "bottom",
    panel.grid.minor = ggplot2::element_blank()
  )

Language

Show the code
readRDS(file.path(params$corpus, "st_languages.rds")) |>
  dplyr::mutate(
    dplyr::across(
      dplyr::starts_with("count_"),
      ~ base::format(.x, big.mark = ",")
    )
  ) |>
  knitr::kable(format = "html", escape = FALSE)
language count_openalex count_spc count_nature count_spcc
Afrikaans 151,411 39 1,352 16
Albanian 13,441 1 59 0
Arabic 629,566 88 3,257 35
Bengali 2,831 2 39 2
Bulgarian 52,148 22 625 7
Catalan 546,849 389 8,458 274
Chinese 4,511,228 263 5,240 39
Croatian 327,706 187 2,572 41
Czech 411,580 42 1,033 14
Danish 271,858 41 1,408 5
Dutch 759,899 150 4,028 29
English 146,688,929 797,903 16,209,547 309,623
Estonian 107,259 21 989 11
Finnish 172,516 21 708 7
French 5,674,578 1,642 172,279 691
German 5,116,198 794 15,167 208
Gujarati 433 0 1 0
Hebrew 3,845 7 99 5
Hindi 10,134 0 22 0
Hungarian 132,278 72 1,863 18
Indonesian 3,063,755 3,781 80,917 1,318
Italian 1,479,762 570 9,394 208
Japanese 6,356,532 499 6,687 65
Kannada 383 0 0 0
Korean 3,772,022 575 8,497 95
Latvian 25,349 2 91 0
Lithuanian 95,891 80 1,018 33
Macedonian 27,115 10 157 2
Malayalam 308 0 6 0
Marathi 2,225 0 5 0
Modern Greek (1453-) 139,854 73 912 15
Nepali (macrolanguage) 3,418 0 52 0
Norwegian 251,962 34 776 5
Panjabi 117 0 0 0
Persian 493,316 101 1,193 36
Polish 1,118,905 159 3,311 51
Portuguese 3,984,017 1,615 33,446 903
Romanian 257,461 47 2,103 19
Russian 2,266,943 2,051 38,678 666
Slovak 51,220 13 201 8
Slovenian 104,045 70 730 29
Somali 19,536 2 36 1
Spanish 7,138,372 3,851 166,248 1,946
Swahili (macrolanguage) 24,069 2 40 1
Swedish 430,826 46 1,297 13
Tagalog 90,089 67 411 2
Tamil 3,049 0 80 0
Telugu 64 0 2 0
Thai 95,819 137 3,340 54
Turkish 868,254 224 5,108 64
Ukrainian 507,916 250 6,002 92
Urdu 2,608 0 4 0
Vietnamese 164,476 61 1,124 14
Welsh 42,793 4 101 1
NA 26,846 2 15 0

This graph only shows the relative number of publications per year to identify different trends.

Show the code
readRDS(file.path(params$corpus, "st_languages.rds")) |>
  # keep top 15 languages by total (before scaling)
  dplyr::mutate(
    total = count_openalex + count_spc + count_nature + count_spcc
  ) |>
  dplyr::slice_max(total, n = 15) |>
  dplyr::arrange(dplyr::desc(total)) |>
  # fix display order so largest stays on top
  dplyr::mutate(language = factor(language, levels = rev(language))) |>
  dplyr::select(-total) |>
  # reshape wide → long
  tidyr::pivot_longer(
    cols = dplyr::starts_with("count_"),
    names_to = "source",
    values_to = "count",
    names_prefix = "count_"
  ) |>
  # scale so each source sums to 1
  dplyr::group_by(source) |>
  dplyr::mutate(
    total_source = base::sum(count, na.rm = TRUE),
    count = dplyr::if_else(total_source > 0, count / total_source, 0)
  ) |>
  dplyr::ungroup() |>
  dplyr::select(-total_source) |>
  ggplot2::ggplot(ggplot2::aes(x = language, y = count, fill = source)) +
  ggplot2::geom_col(position = "dodge") +
  ggplot2::coord_flip() +
  ggplot2::scale_y_continuous(
    labels = scales::label_percent(accuracy = 1),
    expand = ggplot2::expansion(mult = c(0, 0.05))
  ) +
  ggplot2::labs(
    x = "Language",
    y = "Share of total works within each source",
    fill = "Source",
    title = "Publications by language (top 15), scaled so each source sums to 1"
  ) +
  ggplot2::theme_minimal(base_size = 14) +
  ggplot2::theme(
    legend.position = "bottom",
    panel.grid.minor = ggplot2::element_blank()
  )

Reuse

Citation

BibTeX citation:
@report{krug,
  author = {Krug, Rainer M. and Bishop, Gabriella and Villasante,
    Sebastian},
  title = {Spatial {Planning} and {Connectivity} {Corpus} - {Technical}
    {Background} {Report}},
  doi = {10.5281/zenodo.XXXXX},
  langid = {en},
  abstract = {To Be added}
}
For attribution, please cite this work as:
Krug, Rainer M., Gabriella Bishop, and Sebastian Villasante. n.d. “Spatial Planning and Connectivity Corpus - Technical Background Report.” IPBES Spatial Planning and Connectivity Assessment. https://doi.org/10.5281/zenodo.XXXXX.